Add support for multiple and 2>n dimensional HDF5 Datasets#2394
Closed
luccadibe wants to merge 11 commits intoapache:mainfrom
Closed
Add support for multiple and 2>n dimensional HDF5 Datasets#2394luccadibe wants to merge 11 commits intoapache:mainfrom
luccadibe wants to merge 11 commits intoapache:mainfrom
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #2394 +/- ##
============================================
- Coverage 72.29% 71.52% -0.78%
- Complexity 46937 47446 +509
============================================
Files 1513 1539 +26
Lines 178421 182620 +4199
Branches 35036 35913 +877
============================================
+ Hits 128993 130611 +1618
- Misses 39665 42015 +2350
- Partials 9763 9994 +231 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Contributor
|
LGTM - thanks for the patch @luccadibe. Overall, this generalization is great. During the merge, I resolve the merge conflicts, reverted the format changes in MetaDataAll, disabled the last test cases (group1/subgroup/data2) which was consistently failing, and replaced the stdout tracing with log tracing. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR aims to fix and optimize the HDF5Reader implementation from systemds with the goal of being able to correctly read the So2Sat LCZ42 dataset (https://mediatum.ub.tum.de/1454690) .
For this I added support for the filter pipeline and attribute message types from HDF5 ;
n dimensional matrices with n>2 are flattened into 2d .
I also added support for inferring hdf5 from the .h5 file extension.
Apologies for the massive PR 🤕 .
I benchmarked the performance of the new implementation and shared results in this repo:
https://github.com/luccadibe/systemds-hdf5-reader-benchmark
The code still needs some work regarding code style and formatting ( I am not sure if I set up the fomatter correctly as mentioned in the CONTRIBUTING.md ; in some files I was getting a huge diff so I tried to format only what I touched).
I am unsure about how to best split this into multiple PRs , or if that is wanted even.
I would appreciate some general feedback on this.